AITopics | statistical significance

Collaborating Authors

statistical significance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference

Neural Information Processing SystemsJun-23-2026, 04:10:06 GMT

In real-world applications, anomaly detection (AD) often operates without access to anomalous data, necessitating semi-supervised methods that rely solely on normal data. Among these methods, deep k-nearest neighbor (deep kNN) AD stands out for its interpretability and flexibility, leveraging distance-based scoring in deep latent spaces. Despite its strong performance, deep kNN lacks a mechanism to quantify uncertaintyan essential feature for critical applications such as industrial inspection. To address this limitation, we propose a statistical framework that quantifies the significance of detected anomalies in the form of p-values, thereby enabling control over false positive rates at a user-specified significance level (e.g.,0.05). A central challenge lies in managing selection bias, which we tackle using Selective Inference-a principled method for conducting inference conditioned on data-driven selections. We evaluate our method on diverse datasets and demonstrate that it provides reliable AD well-suited for industrial use cases.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Banking & Finance (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

1b8726b572e0dfa72793f9f6590664fd-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-8-2026, 21:01:11 GMT

ai algorithm, algorithm, perform statistical comparison, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.48)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Vishwanath, Krithik, Ghosh, Mrigayu, Alyakin, Anton, Alber, Daniel Alexander, Aphinyanaphongs, Yindalon, Oermann, Eric Karl

arXiv.org Artificial IntelligenceDec-2-2025

Specialized clinical AI assistants are rapidly entering medical practice, often framed as safer or more reliable than general-purpose large language models (LLMs). Yet, unlike frontier models, these clinical tools are rarely subjected to independent, quantitative evaluation, creating a critical evidence gap despite their growing influence on diagnosis, triage, and guideline interpretation. We assessed two widely deployed clinical AI systems (OpenEvidence and UpToDate Expert AI) against three state-of-the-art generalist LLMs (GPT-5, Gemini 3 Pro, and Claude Sonnet 4.5) using a 1,000-item mini-benchmark combining MedQA (medical knowledge) and HealthBench (clinician-alignment) tasks. Generalist models consistently outperformed clinical tools, with GPT-5 achieving the highest scores, while OpenEvidence and UpToDate demonstrated deficits in completeness, communication quality, context awareness, and systems-based safety reasoning. These findings reveal that tools marketed for clinical decision support may often lag behind frontier LLMs, underscoring the urgent need for transparent, independent evaluation before deployment in patient-facing workflows.

generalist model, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01191

Country:

North America > United States > New York (0.23)
North America > United States > Texas > Travis County > Austin (0.16)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.77)

Industry:

Health & Medicine > Health Care Providers & Services (0.96)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fairness Evaluation of Large Language Models in Academic Library Reference Services

Wang, Haining, Clark, Jason, Yan, Yueru, Bradley, Star, Chen, Ruiyang, Zhang, Yiqiong, Fu, Hengyi, Tian, Zuoyu

arXiv.org Artificial IntelligenceNov-24-2025

As libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We find no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrate nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.

large language model, machine learning, patron type, (20 more...)

arXiv.org Artificial Intelligence

2507.04224

Country:

Europe (0.93)
Asia > China (0.67)
North America > United States > Indiana (0.28)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry:

Health & Medicine (1.00)
Education (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Expertise and confidence explain how social influence evolves along intellective tasks

Askarisichani, Omid, Huang, Elizabeth Y., Musaffar, Abed K., Friedkin, Noah E., Bullo, Francesco, Singh, Ambuj K.

arXiv.org Artificial IntelligenceNov-5-2025

Discovering the antecedents of individuals' influence in collaborative environments is an important, practical, and challenging problem. In this paper, we study interpersonal influence in small groups of individuals who collectively execute a sequence of intellective tasks. We observe that along an issue sequence with feedback, individuals with higher expertise and social confidence are accorded higher interpersonal influence. We also observe that low-performing individuals tend to underestimate their high-performing teammate's expertise. Based on these observations, we introduce three hypotheses and present empirical and theoretical support for their validity. We report empirical evidence on longstanding theories of transactive memory systems, social comparison, and confidence heuristics on the origins of social influence. We propose a cognitive dynamical model inspired by these theories to describe the process by which individuals adjust interpersonal influences over time. We demonstrate the model's accuracy in predicting individuals' influence and provide analytical results on its asymptotic behavior for the case with identically performing individuals. Lastly, we propose a novel approach using deep neural networks on a pre-trained text embedding model for predicting the influence of individuals. Using message contents, message times, and individual correctness collected during tasks, we are able to accurately predict individuals' self-reported influence over time. Extensive experiments verify the accuracy of the proposed models compared to baselines such as structural balance and reflected appraisal model. While the neural networks model is the most accurate, the dynamical model is the most interpretable for influence prediction.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2011.07168

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

An Effective Flow-based Method for Positive-Unlabeled Learning: 2-HNC

Hochbaum, Dorit, Nitayanont, Torpong

arXiv.org Artificial IntelligenceNov-4-2025

In many scenarios of binary classification, only positive instances are provided in the training data, leaving the rest of the data unlabeled. This setup, known as positive-unlabeled (PU) learning, is addressed here with a network flow-based method which utilizes pairwise similarities between samples. The method we propose here, 2-HNC, leverages Hochbaum's Normalized Cut (HNC) and the set of solutions it provides by solving a parametric minimum cut problem. The set of solutions, that are nested partitions of the samples into two sets, correspond to varying tradeoff values between the two goals: high intra-similarity inside the sets and low inter-similarity between the two sets. This nested sequence is utilized here to deliver a ranking of unlabeled samples by their likelihood of being negative. Building on this insight, our method, 2-HNC, proceeds in two stages. The first stage generates this ranking without assuming any negative labels, using a problem formulation that is constrained only on positive labeled samples. The second stage augments the positive set with likely-negative samples and recomputes the classification. The final label prediction selects among all generated partitions in both stages, the one that delivers a positive class proportion, closest to a prior estimate of this quantity, which is assumed to be given. Extensive experiments across synthetic and real datasets show that 2-HNC yields strong performance and often surpasses existing state-of-the-art algorithms.

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.08212

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.71)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

A Multi-Evidence Framework Rescues Low-Power Prognostic Signals and Rejects Statistical Artifacts in Cancer Genomics

Akarlar, Gokturk Aytug

arXiv.org Artificial IntelligenceOct-22-2025

Motivation: Standard genome-wide association studies in cancer genomics rely on statistical significance with multiple testing correction, but systematically fail in underpowered cohorts. In TCGA breast cancer (n=967, 133 deaths), low event rates (13.8%) create severe power limitations, producing false negatives for known drivers and false positives for large passenger genes. Results: We developed a five-criteria computational framework integrating causal inference (inverse probability weighting, doubly robust estimation) with orthogonal biological validation (expression, mutation patterns, literature evidence). Applied to TCGA-BRCA mortality analysis, standard Cox+FDR detected zero genes at FDR<0.05, confirming complete failure in underpowered settings. Our framework correctly identified RYR2 -- a cardiac gene with no cancer function -- as a false positive despite nominal significance (p=0.024), while identifying KMT2C as a complex candidate requiring validation despite marginal significance (p=0.047, q=0.954). Power analysis revealed median power of 15.1% across genes, with KMT2C achieving only 29.8% power (HR=1.55), explaining borderline statistical significance despite strong biological evidence. The framework distinguished true signals from artifacts through mutation pattern analysis: RYR2 showed 29.8% silent mutations (passenger signature) with no hotspots, while KMT2C showed 6.7% silent mutations with 31.4% truncating variants (driver signature). This multi-evidence approach provides a template for analyzing underpowered cohorts, prioritizing biological interpretability over purely statistical significance. Availability: All code and analysis pipelines available at github.com/akarlaraytu/causal-inference-for-cancer-genomics

artificial intelligence, machine learning, mutation, (18 more...)

arXiv.org Artificial Intelligence

2510.18571

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)

Add feedback

Filters

Collaborating Authors

statistical significance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference

28dad4a70f748a2980998d3ed0f1b8d2-Supplemental-Conference.pdf

1b8726b572e0dfa72793f9f6590664fd-Supplemental-Datasets_and_Benchmarks_Track.pdf

28dad4a70f748a2980998d3ed0f1b8d2-Supplemental-Conference.pdf

2b515e2bdd63b7f034269ad747c93a42-Supplemental.pdf

Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks

Fairness Evaluation of Large Language Models in Academic Library Reference Services

Expertise and confidence explain how social influence evolves along intellective tasks

An Effective Flow-based Method for Positive-Unlabeled Learning: 2-HNC

A Multi-Evidence Framework Rescues Low-Power Prognostic Signals and Rejects Statistical Artifacts in Cancer Genomics